Media spend optimization using a cross-channel predictive model

ABSTRACT

A method, system, and computer program product for advertising portfolio management. The method form processes steps for determining effectiveness of marketing stimulations in a plurality of marketing channels included in a marketing campaign. The method commences upon receiving data comprising a plurality of marketing stimulations and respective measured responses, then determining from the marketing stimulations and the respective measured responses, a set of cross-channel weights to apply to the respective measured responses, where the cross-channel weights are indicative of the influence that a particular stimulation applied to a first channel has on the measure responses of other channels. The cross-channel weights are used in calculating the effectiveness of a particular marketing stimulation over an entire marketing campaign. The marketing campaign can comprise stimulations quantified as a number of direct mail pieces, a number or frequency of TV spots, a number of web impressions, a number of coupons printed, etc.

FIELD OF THE INVENTION

The disclosure relates to the field of advertising portfolio management and more particularly to techniques for media spend optimization using a cross-channel predictive model.

BACKGROUND

Advertising is big business. In today's global commerce arena, business managers must consider how to tout their products or services using various types of advertising. And, in the hyper-media world in which we live there is a dizzying array of possibilities to spend on advertising (e.g., TV, radio, print, mail, web, etc.). Often an advertising campaign will use multiple channels to establish brand awareness, entice, and convert advertising into action. Some advertising channels capture a direct correspondence between a placement and an action, and some do not. For example, contrast a TV ad placement with a webpage ad (e.g., banner ad, display ad, click-on coupon, etc.). In the webpage case, the precise distribution of the internet ad placements can be determined by the internet ad network provider since at the time an internet ad is displayed, quite a lot is known about the placement as well as the respondent. In the TV case, while it can be known that the ad placement was broadcast, it might not be known precisely who saw the ad. Perhaps only the share of households watching the program can be known.

For managing spend on advertising, advertisers want to know quite specifically how a particular ad placement resulted in a particular behavior by the viewer. In the domain of internet advertising, the details such as the location where the ad was placed, the time of day the ad was placed, responses or actions taken after the placement (e.g., click on an ad or coupon), or in some cases, very precise demographics of the respondent can be known, and can thus be delivered to the advertiser. However when using many other forms of media, it is often collectable only in aggregate. Yet, advertisers strongly desire a level of precision in the form of a specific placement, and the respective answers to “who, what, when” can be used by advertisers to tune their creatives and/or tune their placements.

In many forms of advertising media, clever placements can yield a relationship between stimulus and response (even if only measurable in aggregate). For example, a radio ad in the form of “Call 1-800-123-4567 today for this buy-1-get-two-free offer” might be broadcasted to three million morning commuters, but which specific commuters have heard the spot cannot be determined directly. Indirectly, however, one can measure the effectiveness of the spot by tallying the number of calls into the broadcasted telephone number “1-800-123-4567”.

Prior to the advent of internet advertising, a common expression repeated in advertising circles was, “Half the money I spend on advertising is wasted; the trouble is I don't know which half”. This expression (often attributed to John Wanamaker, b. 1838) illustrates how difficult it is to measure the effectiveness of traditional broadcast or mass advertising.

The problem of determining the effect of one or another type of traditional broadcast or mass advertising (e.g., by media, by channel, by time-of-day, etc.) has long been studied, yet legacy approaches fall short. Legacy approaches rely on a naïve one-to-one correspondence between an advertising placement and a measured response. If an increase in a particular spend (e.g., a radio spot) results in more responses (e.g., calls to the broadcasted 1-800 number) then a legacy approach would recommend to the advertiser to increase spend on those radio spots. Conversely, if spending on direct mailings did not return any leads, then a legacy approach would recommend to the advertiser to decrease spend on such direct mailings. Such legacy approaches are naïve in at least the following aspects:

-   -   Cross-channel influence. For example, the effect of spend on one         channel might influence the effectiveness of another channel.     -   Constraints and Limits. Additional spending on a particular         channel suffers from diminishing returns (e.g., the audience         “tunes out” after hearing a message too many times).

Of course, an advertiser would want to accurately predict the overall effectiveness of a particular change in advertising spending, yet legacy prediction models fail to account for the aforementioned cross-channel effects and constraints.

What is needed is a technique or techniques that consider cross-channel effects and constraints. Further, what is needed is a technique or techniques that consider cross-channel effects and constraints even when direct measurement of the effectiveness of a channel is not available.

None of the aforementioned legacy approaches achieve the capabilities of the herein-disclosed techniques for media spend optimization using a cross-channel predictive model. There is a need for improvements.

SUMMARY

The present disclosure provides an improved method, system, and computer program product suited to address the aforementioned issues with legacy approaches. More specifically, the present disclosure provides a detailed description of techniques used in methods, systems, and computer program products for media spend optimization using a cross-channel predictive model.

A method, system, and computer program product for advertising portfolio management is disclosed. The method forms process steps for determining the effectiveness of marketing stimulations in a plurality of marketing channels included in a marketing campaign. The method commences upon receiving data comprising a plurality of marketing stimulations and respective measured responses, then determining from the marketing stimulations and the respective measured responses, a set of cross-channel weights to apply to the respective measured responses, where the cross-channel weights are indicative of the influence that a particular stimulation applied to a first channel has on the measure responses of other channels. The cross-channel weights are used in calculating the effectiveness of a particular marketing stimulation over an entire marketing campaign. The marketing campaign can comprise stimulations quantified as a number of direct mail pieces, a number or frequency of TV spots, a number of web impressions, a number of coupons printed, etc.

Further details of aspects, objectives, and advantages of the disclosure are described below and in the detailed description, drawings, and claims. Both the foregoing general description of the background and the following detailed description are exemplary and explanatory, and are not intended to be limiting as to the scope of the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 depicts an environment for practicing media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 2 presents a portfolio schematic showing multiple channels as used in systems for media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 3 depicts a multi-channel campaign execution plan to be prosecuted using media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 4A is a chart depicting vectors formed from time-series of scalars as used in forming a cross-channel predictive model, according to some embodiments.

FIG. 4B is a correlation chart showing time- and value-based correlations as used to form a cross-channel predictive model, according to some embodiments.

FIG. 5A depicts an unsupervised model training flow resulting in a baseline trained model, according to some embodiments.

FIG. 5B depicts a supervised model validation flow resulting in a learning model, according to some embodiments.

FIG. 6A and FIG. 6B depict a model development flow used to develop simulation models for use in systems for media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 7 depicts a true model data structure used in systems for media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 8 is a block diagram of a subsystem for populating a true model data structure as used in systems for media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 9 is a block diagram of a subsystem for calculating cross-channel contributions as used in systems for media spend optimization using a cross-channel predictive model, according to some embodiments.

FIG. 10 is a data flow diagram for generating true scores used based on cross-channel responses, according to some embodiments.

FIG. 11 depicts a true metrics report based on the true scores, according to some embodiments.

FIG. 12 is a block diagram of a system for optimizing media spend using a cross-channel predictive model, according to some embodiments.

FIG. 13 depicts a block diagram of an instance of a computer system suitable for implementing an embodiment of the present disclosure.

DETAILED DESCRIPTION Overview

In many forms of advertising media, stimulus and response can be measured only indirectly or can be determined only in aggregate. For example, a radio ad in the form of “Call 1-800-123-4567 today for this buy-1-get-two-free offer” might be broadcasted to three million morning commuters, but which specific commuters have heard the spot cannot be determined directly. Indirectly, however the effectiveness of the spot can be measured by tallying the number of calls into “1-800-123-4567”. Or, again indirectly, the effectiveness of the spot can be measured by running an experiment to see if an increase in the frequency of the radio spots entices commensurately more listeners to send in a “prepaid inquiry postcard” they received in a direct mailing.

The problem of determining the effect of one or another type of advertising (e.g., by media, by channel, by time-of-day, etc.) has long been studied, yet legacy approaches fall short. Legacy approaches rely on a naïve one-to-one correspondence between an advertising placement and a measured response. If an increase in a particular spend (e.g., a radio spot) results in more responses (e.g., more calls to the broadcasted 1-800 number) then a legacy approach would recommend to the advertiser to increase spend on those radio spots. Conversely, if spending on direct mailings did not return any leads, then a legacy approach would recommend to the advertiser to decrease or eliminate spending on such direct mailings. Such legacy approaches are naïve in at least that they fail to consider the following aspects:

-   -   Cross-channel influence from more spending. For example, the         effect of spending more on TV ads might influence viewers to         “log in” (e.g., to access a website) and take a survey or         download a coupon.     -   Cross-channel effects that are counter-intuitive in a single         channel model. For example, additional spending on a particular         channel often suffers from measured diminishing returns (e.g.,         the audience “tunes out” after hearing a message too many         times). Placement of a message can reach a “saturation point”         beyond which point further desired behavior is not apparent in         the measurements in the same channel. However additional         spending beyond the single-channel saturation point may         correlate to improvements in other channels.

An advertiser would want to accurately predict the overall effectiveness of a particular change to the advertiser's ad placement portfolio, yet legacy prediction models fail to account for the aforementioned cross-channel effects.

The cross-channel effects become complex quickly. An advertiser's portfolio might be comprised of a mixture of many placements across a mixture of media outlets. In typical scenarios, an advertiser would advertise using several channels, where each channel is intended to deliver a particular effect. Strictly as examples, the effects considered by advertisers can be classified into three categories: (1) introducers, (2) influencers, and (3) converters.

Continuing this example, introducers provide the first exposure of a brand, product, or promotion to a consumer. An influencer keeps the advertised brand, product, or promotion at the forefront of the consumer's consciousness. Converters directly provoke a user to purchase the advertised product or service. For example, an Internet advertisement may offer a discount to consumers who purchase the advertised product by clicking the advertisement. Each of these types of channels and their respective stimuli have unique strengths and weaknesses and a mixture of such channels and their respective stimuli are often found in successful advertising spend portfolios. Commonly, the mixture of channels and their respective stimuli encompass many tens or hundreds (or more) of placements, each having a corresponding measurement technique. When considering that changing spend in one channel would affect or influence a second channel, and that influences on the second channel could in turn affect a third channel, and so on, it becomes clear that a naïve model falls short.

Advertisers want to accurately predict the overall effectiveness of a portfolio of spends. In particular, advertisers want to accurately forecast the overall effectiveness of a mix of advertising spending (e.g., a portfolio of spends) given a proposed change in spending into one or more channels.

Disclosed herein are modeling techniques that consider intra-channel effects (e.g., saturation, amplification) as well as inter- or cross-channel effects and constraints. Also, disclosed herein are modeling techniques that result in models that accurately forecast the overall effectiveness of a media spending portfolio given a proposed change in the media spending ratios in the portfolio. Further discussed herein are techniques to decrease spending while increasing return on investment (ROI).

DEFINITIONS

Some of the terms used in this description are defined below for easy reference. The presented terms and their respective definitions are not rigidly restricted to these definitions—a term may be further defined by the term's use within this disclosure.

-   -   The term “exemplary” is used herein to mean serving as an         example, instance, or illustration. Any aspect or design         described herein as “exemplary” is not necessarily to be         construed as preferred or advantageous over other aspects or         designs. Rather, use of the word exemplary is intended to         present concepts in a concrete fashion.     -   As used in this application and the appended claims, the term         “or” is intended to mean an inclusive “or” rather than an         exclusive “or”. That is, unless specified otherwise, or is clear         from the context, “X employs A or B” is intended to mean any of         the natural inclusive permutations. That is, if X employs A, X         employs B, or X employs both A and B, then “X employs A or B” is         satisfied under any of the foregoing instances.     -   The articles “a” and “an” as used in this application and the         appended claims should generally be construed to mean “one or         more” unless specified otherwise or is clear from the context to         be directed to a singular form.

Reference is now made in detail to certain embodiments. The disclosed embodiments are not intended to be limiting of the claims.

Descriptions of Exemplary Embodiments

FIG. 1 depicts an environment 100 for practicing media spend optimization using a cross-channel predictive model. As an option, one or more instances of environment 100 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein.

One approach to advertising portfolio optimization uses marketing attributions and predictions determined from historical data. Analysis of the historical data can serve to infer relationships between marketing stimulations and responses. In some cases, the historical data comes from “online” outlets, and is comprised of individual user-level data, where a direct cause-effect relationship between stimulations and responses can be verified. However, “offline” marketing channels, such as television advertising, are of a nature such that indirect measurements are used when developing models used in media spend optimization. For example, some stimuli are described as an aggregate (e.g., “TV spots on Prime Time News, Monday, Wednesday and Friday”) that merely provides a description of an event or events as a time-series of marketing stimulations (e.g., weekly television advertising spends). Responses to such stimuli are also often measured and/or presented in aggregate (e.g., weekly unit sales reports provided by the telephone sales center). Yet, correlations, and in some cases causality and inferences, between stimulations and responses can be determined via statistical methods.

As shown in FIG. 1, stimuli 102 arise from a portfolio of spends (e.g., portfolio 103). The stimuli comprise various “spots” or “placements” (e.g., TV spots, radio spots, print media mailer, web banner ads, etc.). The stimuli are presented to the marketplace and undergo marketplace dynamics resulting in responses 106. Generally, and as shown, at least one response measurement is attempted for each stimulus, which attempt may result in one or more measured responses 108. For example, a “TV Prime Time News” placement might be measured by a “Nielsen Household Share” metric.

In collecting historical data, any series of stimuli 102 from portfolio spends can be considered to be known stimuli 110, and any responses 106 from the observations can be considered to be known responses 112. A model (e.g., learning model 116) can be formed using the historical data. The learning model 116 serves to predict a particular channel response from a particular channel stimulation. For example, if a radio spot from last week Saturday and Sunday resulted in some number of calls to the broadcasted 1-800 number, then the mode can predict that additional radio spots next week Saturday and Sunday might result in the same number of calls to the broadcasted 1-800 number. Of course, there are often influences not included in such a model. For example, next Sunday might be Super Bowl Sunday, which might suggest that many people would be watching TV rather than listening to the radio. Such external factors can be included in a learning model, and incorporation of such external factors is further discussed below.

As earlier indicated, what is desired is a model that considers cross-channel effects even when direct measurements are not available. The simulated model 128 is such a model, and can be formed using any machine learning techniques and/or the operations shown in FIG. 1. Specifically, the embodiment of FIG. 1 shows a technique where variations (e.g., mixes) of stimuli are used with the learning model to capture predictions of what would happen if a particular portfolio variation (e.g., a mix of spends 111) were prosecuted. The learning model 116 produces a set of predictions (e.g., predictions 118 ₁, predictions 118 ₂, predictions 118 ₃, etc.), one set of predictions for each variation (e.g., variation 114 ₁, variation 114 ₂, variation 114 ₃, etc.). In this manner various variations of stimuli 120 produce predicted responses 122, which are used in weighting and filtering operations (e.g., see predictive model 124), which in turn result in a simulated model 128 being output that includes cross-channel predictive capabilities.

A simulated model that includes cross-channel predictive capabilities facilitates making cross-channel predictions from a user-provided scenario (e.g., scenario 130). A user 105 can further use the simulated model 128 to generate other reports (e.g., reports 132 ₁, reports 132 ₂, reports 132 ₃, etc.) based on a particular user-provided scenario. Strictly as one example, a report can come in the form of an ROI report that quantifies the return on investment of the particular mix of spends after considering cross-channel effects.

The mix of spends in portfolio 103 can encompass a wide range of channels over a wide range of media. Some such media and respective channels are presently discussed.

FIG. 2 presents a portfolio schematic 200 showing multiple channels as used in systems for media spend optimization using a cross-channel predictive model. As an option, one or more instances of portfolio schematic 200 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the portfolio schematic 200 or any aspect thereof may be implemented in any desired environment.

As shown, the portfolio schematic 200 includes three types of media, namely TV, radio 203, and print media 206. Under each media type are shown one or more spends. TV comprises stations named CH1 208 and CH2 210. Radio comprises a station named KVIQ 212. Print media comprises distribution in the form of mail, a magazine and/or a printed coupon. For each media shown, there is one or more stimulations (e.g., S1, S2, . . . SN) and its respective response (e.g., R1, R2, R3 . . . RN). As shown, there is a one-to-one correspondence between a particular stimulus and its response. For example, the TV spot “Evening News” 214 is depicted with stimulus S1, and has a corresponding response R1. The stimuli and responses discussed herein are often formed as a time-series of individual stimulations and responses, respectively. For notational convenience a time-series is given as a vector, such as the shown vector S1.

Continuing the discussion of this portfolio schematic 200, the portfolio includes spends for TV in the form of evening news 214, weekly series 216, morning show 218. The portfolio also includes radio spends in the form of a sponsored public service announcement 220, a sponsored shock jock spot 222, and a contest 224. The portfolio includes spends for radio station KVIQ 212, a direct mailer 226, and magazine print ads 228 (e.g., coupon placement 229). The portfolio also includes spends for print media 206 in the form of coupons such as coupon 230 and in-store coupon 231, as shown.

The portfolio schematic includes a graphic depiction of stimulus events shown as stimulus vectors (e.g., S1 246, S2 248, S3 250, S4 252, S5 254, S6 256, S7 258, S8 260, and SN 262). The portfolio schematic 200 also shows a set of response measurements to be taken, shown as response vectors (e.g., R1 264, R2 266, R3 268, R4 270, R5 274, R6 274, R7 276, R8 278, and RN 280). As shown, channel 201 ₁ includes a measurement using Nielsen share 232, channel 201 ₂ includes a measurement using dial-in tweets 234, channel 201 ₃ includes a measurement using number of calls 236, and channel 201 _(N) includes a measurement using number of in-store purchases 244.

FIG. 3 depicts a multi-channel campaign execution plan to be prosecuted using media spend optimization using a cross-channel predictive model. As an option, one or more instances of campaign execution plan 300 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the campaign execution plan 300 or any aspect thereof may be implemented in any desired environment.

An advertising campaign might coordinate placements across many channels using many types of media. Coordination of media might include TV, radio 203, print media 206, web 302, and others. Any one of the available media types might be used as introducers 304, and/or as influencers 306, and/or as converters 308. Often certain marketing objectives (e.g., brand name introduction 310, brand name awareness 312, stimulate purchaser actions taken toward decision 314, etc.) can be met most efficiently using one or another particular type of media or combinations of media. For example, TV is often used as an introducer (e.g., to create brand reach), and print media is often used as an influencer (e.g., to transform brand awareness into some particular actions taken), and the web is often used as a converter (e.g., when the actions taken culminate in a purchase).

In many cases, there is a delay between a particular spend and expectation of a respective response. For example, if a direct mail flyer is mailed on a Saturday evening, it would be expected that responses cannot occur any time before the following Monday. In other cases, an expected response can be obtained even after the marketing spend has been terminated. Such a delayed response can occur for many reasons (e.g., due to factors such as brand equity etc.).

Modeling of such factors can be considered when developing models. In certain situations, the delays are present in a given pair of stimulus-response time-series (see FIG. 4A) and in some cases, delays can be automatically determined during correlation steps (see FIG. 4B).

As shown, the campaign schedule 316 staggers marketing actions over time in expectation of matching the spends to expected delays in response from earlier spends. For example, a mass mailing is undertaken at the earliest moment in the campaign (see Week₁) with the expectation of a mail system delay of a week or less. Then, one week later (see Week₂) TV and radio spots are run. During the prosecution of the campaign, a time-series of spends occurs, and a time-series of responses is observed. Such spends and observations can be codified (e.g., into a spreadsheet or a list or an array, etc.) and used as known stimuli 110 (e.g., in a time-series of stimulus scalars) and known responses 112 (e.g., in a time-series of response scalars).

FIG. 4A is a chart 4A00 depicting vectors formed from time-series of scalars as used in to form a cross-channel predictive model. As an option, one or more instances of vectors or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein.

The shown vector S1 is comprised of a time-series. The time-series can be presented in a native time unit (e.g., weekly, daily) and can be apportioned over a different time unit. For example, stimulus S1 corresponds to a weekly spend for “Prime Time News” even though the stimulus to be considered actually occurs nightly (e.g., during “Prime Time News”). The weekly spend stimulus can be apportioned to a nightly stimulus occurrence. In some situations, the time unit in a time-series can be very granular (e.g., by the minute). Apportioning can be performed using any known techniques. Stimulus vectors (e.g., stimulus vector 202) and response vectors (e.g., response vector 204) can be formed from any time-series in any time units and can be apportioned to another time-series using any other time units.

FIG. 4B is a correlation chart 4B00 showing time- and value-based correlations as used to form a cross-channel predictive model. As an option, one or more instances of correlation chart 4B00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the correlation chart 4B00 or any aspect thereof may be implemented in any desired environment.

A particular stimulus in a first marketing channel (e.g., S1) might produce corresponding results (e.g., R1). Additionally, a stimulus in a first marketing channel (e.g., S1) might produce results (or lack of results) as given by measured results in a different marketing channel (e.g., R3). Such correlation of results or lack of results can be automatically detected, and a scalar value representing the extent of correlation can be determined mathematically from any pair of vectors. In the discussions just below, the correlation of a time-series response vector is considered with respect to a time-series stimulus vector. Correlations can be positive (e.g., the time-series data moves in the same directions), or negative (e.g., the time-series data moves in the opposite directions), or zero (no correlation). Those skilled in the art will recognize there are many known-in-the-art techniques to correlate any pair of curves.

As shown, the vector S1 is comprised of a series of changing values (e.g., depicted by the regression-fitted series covering the curve 403). The response R1 is shown as curve 404. As can be appreciated, even though the curve 404 is not identical to the curve 403 (e.g., it has undulations in the tail) the curve 404 is substantially value-correlated to curve 403. Maximum value correlation 414 occurs when curve 404 is time-shifted by Δt amount of time relative to curve 403 (see time Δt graduations). The amount of correlation (see discussion infra) and amount of time shift can be automatically determined. Cross-channel correlations are presented in Table 1.

TABLE 1 Cross-correlation examples Stimulus Channel → Cross-channel Description S1 → R2 No correlation. S1 → R3 Correlates if time shifted and attenuated S1 → R4 Correlates if time shifted and amplified

In some cases, a correlation calculation can identify a negative correlation where an increase in a first channel causes a decrease in a second channel. Further, in some cases, a correlation calculation can identify an inverse correlation where a large increase in a first channel causes a small increase in a second channel. In still further cases, there can be no observed correlation (e.g., see curve 408), or in some cases correlation is increased when exogenous variables are considered (e.g., see curve R1^(E) 406).

In some cases a correlation calculation can hypothesize one or more causation effects. And in some cases correlation conditions are considered when calculating correlation such that a priori known conditions can be included (or excluded) from the correlation calculations.

Also, as can be appreciated, there is no correlation to the shown time-series R2. The curve 410 is substantially value-correlated (e.g., though scaled down) to curve 403, and is time-shifted by a second Δt amount of time relative to curve 403. The curve 412 is substantially value-correlated (e.g., though scaled up) to curve 403, and is time-shifted by a second Δt amount of time relative to curve 403.

The automatic detection can proceed autonomously. In some cases correlation parameters are provided to handle specific correlation cases. In one case, the correlation between two time-series can be determined to a scalar value using Eq. 1.

$\begin{matrix} {r = \frac{{n{\sum{xy}}} - {\left( {\sum x} \right)\left( {\sum y} \right)}}{\sqrt{{n\left( {\sum x^{2}} \right)} - {\left( {\sum x} \right)^{2}\sqrt{{n\left( {\sum y^{2}} \right)} - \left( {\sum y} \right)^{2}}}}}} & (1) \end{matrix}$

where:

x represents components of a first time-series,

y represents components of a second time-series, and

n is the number of {x, y} pairs.

In some cases, while modeling a time-series, not all the scalar values in the time-series are weighted equally. For example, more recent time-series data values found in the historical data are given a higher weight as compared to older ones. Various shapes of weights to overlay a time-series are possible, and one exemplary shape is the shape of an exponentially decaying model.

Such correlation techniques can be used by a stimulus-response correlator in the context of developing predictive models. Techniques for training predictive models are introduced in FIG. 5A. Techniques for validating predictive models are introduced in FIG. 5B.

FIG. 5A depicts an unsupervised model training flow 5A00 resulting in a baseline trained model. As an option, one or more instances of unsupervised model training flow 5A00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the unsupervised model training flow 5A00 or any aspect thereof may be implemented in any desired environment.

As shown, a model developer module 504 includes a training set reader 506 and a stimulus-response correlator 508. The model developer module 504 takes as inputs a set of experiments 502 (e.g., pairs of stimulus and corresponding response measurements) and a set of exogenous variables 510. As earlier discussed, the exogenous variables serve to eliminate or attenuate effects that are deemed to be independent from the stimulus.

FIG. 5B depicts a supervised model validation flow 5B00 resulting in a learning model. As an option, one or more instances of supervised model validation flow 5B00 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the supervised model validation flow 5B00 or any aspect thereof may be implemented in any desired environment.

The operations as shown and discussed as pertaining to FIG. 5A produce a learning model 116. This learning model can be validated to as to achieve a confidence score and/or precision and recall values. In one case, a portion of the experiments 502 are provided as inputs to the learning model, and predictions 118 are captured. A model validator 518 compares the response predictions of the learning model to the actual response vectors as were captured empirically, and if a sufficient confidence and/or precision and/or recall is determined, then the model is deemed validated. In some cases changes might be indicated, and path 519 is taken for remedial steps. Remedial steps might include compiling additional experiments, and/or performing WIR with different parameters, and/or including or excluding exogenous variables, etc.

As described above, validations are performed on the model using historical data itself (e.g., where both the stimulus and response are measured data) to ensure goodness of fit and prediction accuracy. In addition to model validation using the training dataset, additional validation steps are performed to check prediction accuracy and to ensure the model is not just doing a data fitting.

Model validation can occur at any moment in time, and indeed, model validation can occur using the supervised model validation flow 5B00). For example, the model developer module 504 can update the learning model. In such a case, a training model can be trained using training data up to the latest available date, which training model in turn can be used to predict the values in the historical data (e.g., data captured in the past). The error in the training model can be calculated. Statistical metrics can be employed to calculate error in the training model.

As shown, (e.g., see path 519) model development and optimization is an iterative process (e.g., see decision 521 and path 519) involving updating the model with changes, and/or adjustments, and/or new or different exogenous variables (see discussion below), and/or newly captured stimulus/response data, etc. to make sure the model behaves within tolerances with respect to predictive statistic metrics, such as using significance tests.

Exogenous Variables

Use of exogenous variables might involve considering seasonality factors or other factors that are hypothesized to impact, or known to impact, the measured responses. For example, suppose the notion of seasonality is defined using quarterly time graduations. And the measured data shows only one quarter (e.g., the 4^(th) quarter) from among a sequence of four quarters in which a significant deviation of a certain response is present in the measured data. In such a case, the exogenous variables 510 can define a variable that lumps the 1^(St) through 3^(rd) quarters into one variable and the 4^(th) quarter in a separate variable. The model developer module 504, and/or its input functions, may determine that for a certain response, there is no period that behaves significantly differently from other periods, in which case the seasonality is removed or attenuated for that response.

FIG. 6A and FIG. 6B depict a model development flow 6A00 and a simulation model development flow 6B00. As an option, one or more instances of the flows or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein.

As shown, stimulus vectors S1 through SN are collected, and response vectors R1 through RN are collected and organized in a one-to-one pairings (see operation 612). A portion of the collected pairs (e.g., pairs S1R1 through S3R3) can be used to train a learning model (see operation 614). A different portion of the collected pairs (e.g., pairs S4R5 through S6R6) can be used to validate the learning model (see operation 616). The processes of training and validating can be iterated (see path 620), perhaps using any of the model development techniques shown and described pertaining to FIG. 5A and FIG. 5B. Processing continues to operations depicted in FIG. 6B.

FIG. 6B depicts process steps used in the generation of a simulation model from a training model (see grouping 608). A cross-channel correlator 636 is used to carry out some or all of the following steps:

-   -   Run simulations of varying stimulus using the learning model to         predict output value changes (e.g., responses) from the varied         stimulation (see operation 622).     -   Using the simulations of operation 622, observe the changes in         the responses in other channels (see operation 624). For         example, and as shown, if only stimulus S1 is applied and varied         across some range, the predicted response given as P2 is         captured. A response in channel #2 (i.e., P2) to a stimulus         variation over a channel #1 stimulus (i.e., S1′) is deemed to be         a cross-channel effect. In some cases, the effect in a cross         channel can be modeled as a linear response, and a cross-channel         weight (e.g., W2) can be calculated and stored as a value. A         weight value corresponding to the effect in channel #M from a         stimulus in channel #N can be noted as W_(SNRM).     -   Weight values covering all combinations of stimulus-response         pairs can be stored in a data structure (see operation 626). As         shown, such a data structure can be organized to cross-channel         response contributions 628 for each cross-channel simulation         (e.g., the shown N by N 2D array) plus as many additional         simulated values as are performed over a sweep. For example, if         a training model captured data from N channels, and a stimulus         value was swept over the range [−100% through 100%] in 20%         increments, the data structure would have a 3^(rd) dimension for         holding a weight value for each of the simulated variations of         {−100%, −80%, −60%, −40%, −20%, 0%, +20%, +40%, +60%, +80%, and         +100%}. A portion of such a data structure is given in FIG. 7.     -   Noisy values can be filtered out (see operation 630). Or, weight         values that are above or below a particular threshold can be         eliminated. The resulting true scores 632 are used to predict         the response of the entire system based on a particular         simulation model (see operation 634).

Having a simulation model that is populated with true scores facilitates using the true score simulation model to predict the response of the entire system based on a particular stimulus (e.g., a prophetic stimulus or prophetic scenario of stimuli). The true score model can be used to model stimulus-response behavior including cross-channel effects (see operations corresponding to 610). For example, if an advertiser wants to know what would be the effect on coupon redemptions if the frequency of radio spots were increased, then the advertiser would use a true score simulation model to predict the response of the entire system based on a particular stimulus of increased frequency of radio spots. Also, the advertiser can use the true score simulation model to predict the overall campaign response based on a plurality of changed stimulations. Or, an advertiser can carry out an experiment in the past. For example, if an advertiser wants to know what would have been the overall campaign effect of doubling last quarter's TV spots, the advertiser can use the true score simulation model to get an answer to what would have happened.

FIG. 7 depicts a true model data structure 700 used in systems for media spend optimization using a cross-channel predictive model. As an option, one or more instances of true model data structure 700 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the true model data structure 700 or any aspect thereof may be implemented in any desired environment.

Earlier figures depict a data structure to hold true scores, the true scores comprising weights to characterize channel-by-channel responses from a particular stimulus. As shown in FIG. 7, the data structure comprises a stimulus ordinate 704, a response abscissa 706, and a dimension labeled as “deltas 702”. This organization provides storage space for weight values to be stored, each weight value being used to characterize channel-by-channel responses from a particular stimulus. More specifically, and as shown, the effect of stimulus S1 on cross-channel R2 can be held in such a data structure. Still more, any number of variations of S1 and corresponding effects on responses can be modeled. In the specific embodiment of FIG. 7, the variations shown correspond to an increase of 20%, an increase of 80%, a decrease of 20%, and a decrease of 80%.

FIG. 8 is a block diagram of a subsystem 800 for populating a true model data structure as used in systems for media spend optimization using a cross-channel predictive model. As an option, one or more instances of subsystem 800 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the subsystem 800 or any aspect thereof may be implemented in any desired environment.

As shown, the system can commence when a particular known stimulus is selected (see operation 802). Then a step to sweep over a range is entered (see operation 810). A particular sweep value (e.g., +20%, +40%, +80%, −20%, etc.) is selected and used as an input to a simulator 806, which in turn takes in the learning model 116. The simulator, in conjunction with the learning model, produces responses (see operation 812), and each response can be captured. A series of simulations may comprise many selections of known stimuli, and a given stimulus may have a sweep range that comprises many steps, thus a decision 816 determines if there are more simulations to be performed. If so, processing continues to perform simulations over more sweep values or to perform simulations over more selected stimuli (see decision 814). When decision 816 deems that there are no more simulations to be performed, then a step is entered to observe outputs of the simulations to compare changes in model responses given the delta simulations (see operation 818). The simulated responses 826 are observed, and weight values are calculated (e.g., using a linear apportioning). The weight values are checked against one or more thresholds (see operation 820), and some weight value (e.g., weight values smaller than a threshold) can be eliminated. Remaining weight values are saved in a data structure as true scores (see operation 822). The resulting data structure is used as a constituent to simulated model 128.

FIG. 9 is a block diagram of a subsystem 900 for calculating cross-channel contributions. As an option, one or more instances of subsystem 900 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the subsystem 900 or any aspect thereof may be implemented in any desired environment.

The above discussion of FIG. 8 describes steps to observe outputs of the simulations to compare changes in model responses given the simulated responses. The simulated responses 826 are observed, and the contribution in a response channel is calculated based on the stimulus. Specifically, and as shown in FIG. 9, the contribution is a response channel resulting from a particular stimulus can be determined by comparing the response with a delta variation to the response absent the delta variation.

FIG. 9 depicts a sample partitioning of operations to determine cross-channel effects over all stimulus and over all channels over the selected attribute. In this partitioning, the technique to determine cross-channel effects partitions certain operations into partitions, namely:

-   -   a first partition being a weight determinator 920, and     -   a second partition being a weight filter 930.         Operations in the partitions cooperate in a manner that results         in true scores 126.

Continuing with the discussion of FIG. 9, and as shown, an attribute is first selected (see operation 901), then calculating cross-channel contributions commences upon selecting a particular attribute (e.g., spend); then selecting a stimulation vector SVi that relates to the selected attribute (see operation 902). Strictly as examples, a particular stimulation vector SVi (e.g., placement of “TV spots on Prime Time News”) might be selected since it directly relates to the attribute (spend on TV spots). Or, a particular stimulation vector SVi (e.g., placements of flysheet ads”) might be selected since it relates to a particular attribute (spend on newspaper spots).

The calculation of cross-channel contributions continues by entering a comparison loop 904 within which loop the following steps are taken:

-   -   Select a response vector RVj (see step 906). Response vectors         RVj (where j is not equal to i) are deemed to be cross-channel         response vectors. The cross-channel response vectors are used in         the analysis of step 908.     -   Step 908 serves to calculate and store any contribution in         response vector RVj resulting from stimulus vector SVi. As         earlier indicated, a stimulus vector SVi might be a stimulus         vector as a provided to the model, or a stimulus vector SVi         might be a stimulus vector that has been apportioned by a sweep         operation.     -   The result of comparison calculations can be stored in a data         structure comprising simulated responses and cross-channel         response contributions 628.     -   If there are more cross channels to consider (see decision 912),         then path 914 is taken.     -   If there are more stimulus vectors to consider (see decision         916), then path 918 is taken.     -   When the comparison loop exits (e.g., there are no more stimulus         vectors to consider), then processing proceeds to filtering         operations (see operation 931).

Operation 931 serves to select-in (or eliminate-out) sufficiently high (or sufficiently low) contributions to generate true scores of contributions. The true scores 126 are stored in a data structure.

FIG. 10 is a data flow diagram 1000 for generating true scores used based on cross-channel responses. As an option, one or more instances of diagram 1000 or any aspect thereof may be implemented in the context of the architecture and functionality of the embodiments described herein. Also, the diagram 1000 or any aspect thereof may be implemented in any desired environment.

A computer-implemented method can execute the data flow diagram 1000. The shown flow can be used in determining the effectiveness of marketing stimulations (e.g., stimulus vector 202) in a plurality of marketing channels (e.g., marketing channel 201 ₁, marketing channel 201 ₂, etc.). The flow proceeds upon receiving data comprising marketing stimulations (e.g., stimuli 102) and responses (e.g., responses 106). The marketing stimulations and respective measured responses can be received as sets of data in pairs (e.g., a one-to-one correspondence of a particular stimulus and its respective response) or as sets of data that has been aggregated (e.g., a many-to-one correspondence of a particular stimulus and a set of observed responses). The flow continues by determining from the marketing stimulations and the respective measured responses a set of cross-channel weights (e.g., cross-channel effectiveness weights 1008) to apply to the respective measured responses. As shown and discussed as pertaining to FIG. 6B, simulations of varying stimulus are conducted using the learning model to predict output value changes (e.g., responses) from the varied stimulations.

Using the aforementioned simulations, a weight determinator 920 observes the changes in the responses in cross-channels as a result of the varying stimulus. In some cases the cross-channel weights are filtered (e.g., using a weight filter 930) so as to eliminate small cross-channel weights, and/or to eliminate statistically insignificant cross-channel weights, and/or to eliminate statistically outlying cross-channel weights, etc. The remaining cross-channel weights are stored in a data structure. The remaining cross-channel weights are used in calculating an effectiveness value of a particular one of the marketing stimulations. As an example, the effect of spending on TV spots might influence the effectiveness of a direct mail campaign.

Of course, the foregoing example does not limit the generality. The marketing stimulations can come in the form of an advertising spend, a number of direct mail pieces, a number of TV spots, a number of radio spots, a number of web impressions, a number of coupons printed etc. Further, the measured responses can come in the form of a number of calls into a call center after a broadcast, a number of clicks on an impression, a number of coupon redemptions, etc.

FIG. 11 depicts a true metrics report based on the true scores. The shown true metrics report 1100 depicts various measures of attribution across channels. In this embodiment of a true metrics report, several channels are depicted, namely “TVOther”, “TVSynd”, “TVBET”, etc.). For each channel, a particular stimulus is depicted (e.g., dollars spent in a respective channel). The observed response in the same channel is also depicted (e.g., see the observed verification column).

Using the cross-channel true scores developed using the techniques described herein, a true contribution of the responses can be apportioned to the channels (see true responses based on true scores). In some embodiments, the contribution attributed to a particular channel is apportioned as a percent.

The shown true metrics report 1100 depicts a row labeled “Organic”. The organic row arises when it is determined that the entirety of the stimulus cannot be completely attributed to the corresponding stimulated channels receiving the stimulus. In this example, the portion of the aggregate response that is not attributed to the aggregate stimulus is labeled as “Organic”, however other labels are possible. As shown, the row “Organic” is included to account for aggregate responses that result from effects other than the stimulated channels. In this example, the organic effect amounts to 9.8% of the total.

Additional Practical Application Examples

FIG. 12 is a block diagram of a system for optimizing media spend using a cross-channel predictive model, according to some embodiments. As an option, the present system 1200 may be implemented in the context of the architecture and functionality of the embodiments described herein. Of course, however, the system 1200 or any operation therein may be carried out in any desired environment.

As shown, system 1200 comprises at least one processor and at least one memory, the memory serving to store program instructions corresponding to the operations of the system. As shown, an operation can be implemented in whole or in part using program instructions accessible by a module. The modules are connected to a communication path 1205, and any operation can communicate with other operations over communication path 1205. The modules of the system can, individually or in combination, perform method operations within system 1200. Any operations performed within system 1200 may be performed in any order unless as may be specified in the claims.

The embodiment of FIG. 12 implements a portion of a computer system, shown as system 1200, comprising a computer processor to execute a set of program code instructions (see module 1210) and modules for accessing memory to hold program code instructions to perform: receiving data comprising a plurality of marketing stimulations and respective measured responses (see module 1220); determining, from the marketing stimulations and the respective measured responses, cross-channel weights to apply to the respective measured responses (see module 1230); and calculating an effectiveness value of a particular one of the marketing stimulations using the cross-channel weights (see module 1240).

System Architecture Overview

FIG. 13 depicts a diagrammatic representation of a machine in the exemplary form of a computer system 1300 within which a set of instructions, for causing the machine to perform any one of the methodologies discussed above, may be executed. In alternative embodiments, the machine may comprise a network router, a network switch, a network bridge, Personal Digital Assistant (PDA), a cellular telephone, a web appliance or any machine capable of executing a sequence of instructions that specify actions to be taken by that machine.

The computer system 1300 includes a processor 1302, a main memory 1304 and a static memory 1306, which communicate with each other via a bus 1308. The computer system 1300 may further include a video display unit 1310 (e.g., a liquid crystal display (LCD) or a cathode ray tube (CRT)). The computer system 1300 also includes an alphanumeric input device 1312 (e.g., a keyboard), a cursor control device 1314 (e.g., a mouse), a disk drive unit 1316, a signal generation device 1318 (e.g., a speaker), and a network interface device 1320.

The disk drive unit 1316 includes a machine-readable medium 1324 on which is stored a set of instructions (i.e., software) 1326 embodying any one, or all, of the methodologies described above. The software 1326 is also shown to reside, completely or at least partially, within the main memory 1304 and/or within the processor 1302. The software 1326 may further be transmitted or received via the network interface device 1320.

It is to be understood that various embodiments may be used as or to support software programs executed upon some form of processing core (such as the CPU of a computer) or otherwise implemented or realized upon or within a machine or computer readable medium. A machine-readable medium includes any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer). For example, a machine-readable medium includes read-only memory (ROM); random access memory (RAM); magnetic disk storage media; optical storage media; flash memory devices; or any other type of non-transitory media suitable for storing or transmitting information.

A module as used herein can be implemented using any mix of any portions of the system memory, and any extent of hard-wired circuitry including hard-wired circuitry embodied as a processor 1302.

In the foregoing specification, the disclosure has been described with reference to specific embodiments thereof. It will, however, be evident that various modifications and changes may be made thereto without departing from the broader spirit and scope of the disclosure. For example, the above-described process flows are described with reference to a particular ordering of process actions. However, the ordering of many of the described process actions may be changed without affecting the scope or operation of the disclosure. The specification and drawings are, accordingly, to be regarded in an illustrative sense rather than restrictive sense. 

What is claimed is:
 1. A computer-implemented method for determining effectiveness of marketing stimulations in a plurality of marketing channels, the computer-implemented method comprising: receiving data comprising a plurality of marketing stimulations and respective measured responses; determining, from the marketing stimulations and the respective measured responses, cross-channel weights to apply to the respective measured responses; and calculating an effectiveness value of a particular one of the marketing stimulations using the cross-channel weights.
 2. The method of claim 1, wherein the marketing stimulations comprise at least one of, an advertising spend, a number of direct mail pieces, a number of TV spots, a number of radio spots, a number of web impressions, and a number of coupons printed.
 3. The method of claim 1, further comprising processing the marketing stimulations and respective measured responses to form a learning model.
 4. The method of claim 3, further comprising using the learning model to predict a portion of a response in a second channel resulting from a stimulus in a first channel.
 5. The method of claim 4 wherein using the learning model to predict a portion of a response in a second channel resulting from a stimulus in a first channel comprises running a plurality of simulations.
 6. The method of claim 5 wherein individual ones of the plurality of simulations comprise varying the stimulus in a first channel and observing the response in the second channel.
 7. The method of claim 5, further comprising outputting a simulated model.
 8. The method of claim 7, further comprising using the simulated model to generate one or more reports based on a user scenario.
 9. The method of claim 1, further comprising determining a portion of aggregate response that is not attributed to aggregate stimulus.
 10. A computer program product embodied in a non-transitory computer readable medium, the computer readable medium having stored thereon a sequence of instructions which, when executed by a processor causes the processor to execute a process, the process comprising: receiving data comprising a plurality of marketing stimulations and respective measured responses; determining, from the marketing stimulations and the respective measured responses, cross-channel weights to apply to the respective measured responses; and calculating an effectiveness value of a particular one of the marketing stimulations using the cross-channel weights.
 11. The computer program product of claim 10, wherein the marketing stimulations comprise at least one of, an advertising spend, a number of direct mail pieces, a number of TV spots, a number of radio spots, a number of web impressions, and a number of coupons printed.
 12. The computer program product of claim 10, further comprising instructions for processing the marketing stimulations and respective measured responses to form a learning model.
 13. The computer program product of claim 12, further comprising instructions for using the learning model to predict a portion of a response in a second channel resulting from a stimulus in a first channel.
 14. The computer program product of claim 13 wherein using the learning model to predict a portion of a response in a second channel resulting from a stimulus in a first channel comprises running a plurality of simulations.
 15. The computer program product of claim 14 wherein individual ones of the plurality of simulations comprise varying the stimulus in a first channel and observing the response in the second channel.
 16. The computer program product of claim 15, further comprising instructions for outputting a simulated model.
 17. The computer program product of claim 16, further comprising instructions for using the simulated model to generate one or more reports based on a user scenario.
 18. The computer program product of claim 10, further comprising determining a portion of aggregate response that is not attributed to aggregate stimulus.
 19. A computer system comprising: a computer processor to execute a set of program code instructions; and a memory to hold the program code instructions, in which the program code instructions comprises program code to perform, receiving data comprising a plurality of marketing stimulations and respective measured responses; determining, from the marketing stimulations and the respective measured responses, cross-channel weights to apply to the respective measured responses; and calculating an effectiveness value of a particular one of the marketing stimulations using the cross-channel weights.
 20. The computer system of claim 19, wherein the marketing stimulations comprise at least one of, an advertising spend, a number of direct mail pieces, a number of TV spots, a number of radio spots, a number of web impressions, and a number of coupons printed. 